feat(docker): production-optimized multi-stage Dockerfile by pratikbin · Pull Request #90 · chopratejas/headroom

pratikbin · 2026-04-02T19:16:13Z

Summary

Multi-stage build: build deps (gcc/g++) in builder stage, runtime is clean python:3.11-slim with only curl
uv with lockfile instead of raw pip for deterministic, fast installs with build cache mounts
Dep layer cached independently from source code — source-only rebuilds drop from ~37s to ~4s
Non-root headroom:1000 user instead of root
Proper ENTRYPOINT/CMD separation for clean docker-compose overrides
CI workflow for multi-arch (linux/amd64 + linux/arm64) image publishing to GHCR
Expanded .dockerignore to exclude JS artifacts, IDE files, Docker meta-files

Benchmarks (cold build, Apple Silicon / OrbStack)

Metric	Old	New	Improvement
Image size	1.11 GB	514 MB	-54%
Cold build	55.6s	32.2s	-42%
Rebuild (source change)	36.8s	3.9s	-89%

Closes #89

Test plan

Local build succeeds (docker build -t headroom:local .)
ARM64 build succeeds (docker buildx build --platform linux/arm64)
Container starts, /health returns 200
CLI --help works inside container
CI workflow triggers on release and publishes to GHCR

Use asyncio.run() instead of asyncio.get_event_loop().run_until_complete() which raises RuntimeError in Python 3.10+ when no event loop exists.

Move the network timeout skip handler to the main tests/conftest.py so it applies to all tests, not just tests/test_memory/*. Fixes flaky CI failures when HuggingFace model downloads timeout.

- Add mkdocs.yml with Material theme (indigo, professional) - Add docs/index.md landing page with quick install - Add GitHub Actions workflow for auto-deployment - Remove old docs/README.md (replaced by index.md)

- Add web dashboard at /dashboard endpoint with real-time stats - Simplify dashboard metrics to user-friendly terms (removed confusing CCR/TOIN terminology) - Track Headroom overhead separately from total latency - Add request logging to Bedrock paths (was missing) - Use package version (__version__) instead of hardcoded "1.0.0" - Add latency min/max tracking in addition to average Dashboard shows: requests, tokens saved, cost saved, overhead, providers breakdown, performance stats, and recent requests table.

- Add dashboard URL (http://localhost:8787/dashboard) to quickstart - Recommend headroom-ai[all] for best compression performance - Note that first startup downloads ML models (~500MB one-time)

## Description Add Headroom integration with AWS Strands Agents SDK, enabling automatic context optimization and tool output compression for Strands-based agents. Fixes chopratejas#14 ## Type of Change - [x] New feature (non-breaking change that adds functionality) - [x] Documentation update ## Changes Made ### Core Integration (`headroom/integrations/strands/`) - **HeadroomHookProvider** - Implements Strands `HookProvider` interface for automatic tool output compression via `AfterToolCallEvent`. Compresses verbose tool outputs before they enter conversation context. - **HeadroomStrandsModel** - Model wrapper that extends Strands `Model` base class for message-level optimization. Implements all required abstract methods: `stream()`, `get_config()`, `update_config()`, `structured_output()`. - **Provider auto-detection** - Automatically detects appropriate Headroom provider (Anthropic, OpenAI, Google) based on wrapped Strands model type. - **`strands-agents` as optional dependency** - Install with `pip install headroom-ai[strands]` ### Testing (`tests/integrations/test_strands/`) - **Real integration tests (25 tests)** - Use actual AWS Bedrock API calls with Claude 3 Haiku. Skip automatically when credentials unavailable. - **Unit tests (57 tests)** - Mock-based tests for internal logic, edge cases, and error handling. No credentials required. ### Demo (`examples/strands_bedrock_demo.py`) - Interactive demo showcasing both integration patterns - Visual before/after compression comparison with token savings - 4 verbose tools (search, logs, database, metrics) demonstrating real savings - Supports `--hook` and `--model` flags for individual demos ## Testing All tests verified: - [x] Unit tests pass (57 tests) - [x] Integration tests pass (25 tests with real Bedrock API) - [x] Linting passes (`ruff check .`) - [x] Type checking passes (`mypy headroom/integrations/strands/`) - [x] Formatting passes (`ruff format --check`) - [x] Demo runs successfully with ~50% token savings ## Test Output ``` $ pytest tests/integrations/test_strands/ -v =================== 82 passed in 90.09s =================== $ ruff check headroom/integrations/strands/ --ignore E402 All checks passed! $ mypy headroom/integrations/strands/ --ignore-missing-imports Success: no issues found ``` ## Demo Results ``` ╭────────────────────────────────────────────────────────────╮ │ HeadroomHookProvider Results │ │────────────────────────────────────────────────────────────│ │ Tokens BEFORE compression: 51,961 │ │ Tokens AFTER compression: 25,658 │ │ Tokens SAVED: 26,303 (50.6%) │ ╰────────────────────────────────────────────────────────────╯ ```

…tegration Feature/strands integration

DiffCompressor: - Parse unified diff format and compress by reducing context lines - Preserve file headers and all +/- change lines - Score hunks by relevance (error keywords, query matches) - Add summary line: [N files, +X -Y lines] - Expected 30-50% savings on typical git diffs - Wire into content router for CompressionStrategy.DIFF - 30 tests covering parsing, compression, edge cases hnswlib SIGILL fix: - Move hnswlib import from module level to lazy loading - hnswlib crashes with SIGILL (Illegal Instruction) on CPUs without AVX support, before Python can catch the error - Now imports only when HNSWVectorIndex is actually used - HNSW_AVAILABLE is checked lazily via __getattr__ Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

HTMLExtractor uses trafilatura to extract main content from HTML pages, removing scripts, styles, navigation, and ads. This achieves 94.9% compression while preserving 98.2% recall on the Scrapinghub benchmark. Key features: - Automatic HTML detection in content router - Configurable output format (markdown or text) - Metadata extraction (title, author, date, description) - Batch extraction support Evaluation framework: - OSS benchmark integration (Scrapinghub Article Extraction Benchmark) - LLM-as-judge evaluation for QA accuracy preservation - F1 score: 0.919 on 181-sample benchmark (baseline: 0.958) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Rename client/response variables to be unique per provider branch to avoid type inference conflicts. Use getattr for Anthropic content block text access to handle union types. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add native support for OpenRouter API via LiteLLM backend - Introduce PROVIDER_REGISTRY pattern to eliminate scattered if/else blocks - New providers can now be added with a single registry entry Features: - `headroom proxy --backend openrouter` routes requests to OpenRouter - Pass-through model naming (anthropic/claude-3.5-sonnet, openai/gpt-4o, etc.) - CLI shows provider-specific setup instructions from registry Usage: export OPENROUTER_API_KEY="sk-or-v1-..." headroom proxy --backend openrouter Also fixes mypy type errors in mcp_server.py

…ic tools Add ability to exclude specific tools from compression, useful for CLI tools like Claude Code where file/search output should be passed through unmodified. Changes: - Add DEFAULT_EXCLUDE_TOOLS constant with Read, Grep, Glob, Bash, WebFetch, WebSearch - Add exclude_tools field to SmartCrusherConfig and ContentRouterConfig - Add _build_tool_name_map() to ContentRouter for tool_call_id -> name mapping - Skip compression for tool_result blocks from excluded tools - Support both Anthropic (tool_use/tool_result) and OpenAI (tool_calls/tool) formats This prevents Headroom from compressing output from tools where the user expects to see the full, unmodified content (e.g., file reads, search results). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add pytest.importorskip("trafilatura") to HTML extractor test modules to skip tests gracefully when the optional trafilatura dependency is not installed. This fixes CI failures in the base test matrix that doesn't include the html extras. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The previous fix attempted to lazily import hnswlib but calling _check_hnswlib_available() still triggered the import, which crashed with SIGILL on CPUs without AVX support before Python could catch it. Fix by using subprocess to safely probe for hnswlib availability: - Import AND create an Index in a subprocess to catch SIGILL at both import time and first use of AVX instructions - If subprocess succeeds, then import in main process - Add debug logging for all failure modes (timeout, crash, etc.) - Isolates any crash to the subprocess, keeping test process alive AI review: code-reviewer (1 iteration) Adversarial review: code-critic (addressed logging, more robust probe) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add pytestmark skip conditions to memory test modules that depend on hnswlib (core_operations, factory, easy). The subprocess probe for hnswlib correctly detects unavailability on some platforms (like Python 3.13 CI runners), but these tests were still trying to run and failing with ImportError. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

## What this PR fixes 1. **CI Python 3.12 failure**: Added skip decorator to `TestLocalBackend` in `test_memory_system.py` - these tests require hnswlib which is not available on all CI runners. 2. **Missing test coverage**: Added 6 tests for the `exclude_tools` feature in `test_content_router.py`. Tests use existing helper functions `generate_python_code()`, `generate_json_data()`, and `generate_search_results()` defined at lines 57-95 of the same file. 3. **Anthropic/OpenAI inconsistency**: Fixed `_process_content_blocks()` to add `router:excluded:tool` marker for Anthropic format, matching the OpenAI format behavior at line 1157. 4. **Dead code removal**: Removed unused `exclude_tools` field from `SmartCrusherConfig` - the actual implementation uses `ContentRouterConfig.exclude_tools` in content_router.py. AI review: code-reviewer (2 iterations), adversarial-reviewer (2 iterations) Issues fixed: missing test coverage, format inconsistency, dead code Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

WebFetch and WebSearch should NOT be excluded by default because: 1. Web content is Headroom's sweet spot - lots of noise (nav, ads, boilerplate) 2. CCR allows retrieval if LLM needs original content 3. Excluding them undermines the core value proposition DEFAULT_EXCLUDE_TOOLS now only contains local file/code tools: - Read, Glob, Grep, Bash (and lowercase variants) These local tools return precise content (line numbers, paths, code) where exact fidelity matters immediately. Web tools benefit from compression and can use CCR for on-demand retrieval. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…e-tools-compression feat(compression): add exclude_tools to bypass compression for specific tools

- Add `question` parameter to LLMLinguaCompressor.compress() for QA-aware token selection (passes to LLMLingua-2's compress_prompt) - Flow `question` parameter through ContentRouter compression pipeline - Enable ContentRouter in default pipeline (was missing, causing 0% compression) - Add `content_router_enabled` config option to HeadroomConfig This improves compression accuracy for QA tasks by allowing LLMLingua-2 to preserve tokens relevant to answering the given question.

Root Cause: The `find_tool_units()` function in `parser.py` only detected OpenAI format tool calls (assistant.tool_calls + role="tool" messages), not Anthropic format (assistant.content[type=tool_use] + user.content[type=tool_result]). This caused RollingWindow and IntelligentContext transforms to treat Anthropic tool_use and tool_result as separate, independently droppable messages. When context needed to be trimmed, the assistant message with tool_use could be dropped while keeping the user message with tool_result, creating orphaned tool_result blocks. When sent to the Anthropic API, this produces the error: "unexpected tool_use_id found in tool_result blocks" Changes: 1. parser.py: Extended `find_tool_units()` to detect Anthropic format: - Scan user messages for content blocks with type="tool_result" - Scan assistant messages for content blocks with type="tool_use" - Map tool_use_id to corresponding response message indices 2. rolling_window.py: Extended `_get_protected_indices()` to protect Anthropic format tool pairs: - Detect tool_use blocks in assistant.content - Find and protect matching user messages with tool_result blocks 3. tests/test_parser.py: Added 4 new tests for Anthropic format: - test_anthropic_format_tool_use_and_result - test_anthropic_format_multiple_tool_uses - test_anthropic_format_orphaned_tool_result - test_mixed_openai_and_anthropic_formats Test Results: 82 passed (including 4 new Anthropic format tests) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ropic-tool-unit-detection fix: Handle Anthropic format tool_use/tool_result as atomic units

…ager Follows up on PR chopratejas#19 which fixed RollingWindow but missed IntelligentContextManager, the default context manager used by the proxy. Changes: 1. intelligent_context.py: Extended `_get_protected_indices()` to handle Anthropic format: - Scan assistant.content for type="tool_use" blocks - Protect user messages containing type="tool_result" blocks with matching tool_use_id 2. test_intelligent_context.py: Added TestAnthropicFormatToolProtection class with 5 tests: - test_anthropic_tool_result_protected_when_tool_use_protected - test_anthropic_tool_units_dropped_atomically - test_anthropic_multiple_tools_same_message_atomic - test_anthropic_format_no_api_error_scenario (verifies bug fix) - test_mixed_openai_and_anthropic_formats 3. test_rolling_window.py: Added matching TestAnthropicFormatToolProtection class with 5 tests - Added skip decorator for CI/CD when OPENAI_API_KEY is not set This ensures both context managers (RollingWindow and IntelligentContextManager) correctly handle Anthropic's native tool_use/tool_result format, preventing the "unexpected tool_use_id found in tool_result blocks" API error.

…ntext-anthropic-format fix: Extend Anthropic format tool protection to IntelligentContextManager

- Add MemoryToolAdapter for unified memory across providers - Anthropic: Uses native memory tool (memory_20250818) for subscription safety - OpenAI/Gemini/Others: Uses function calling format - All providers share the same semantic vector store backend - Simplify CLI to single --memory flag with auto-detection - Add proper resource cleanup (close methods) to fix test isolation - Update README with memory documentation

Implements comprehensive memory tracking for all in-memory components: - Add MemoryTracker singleton with ComponentStats, ProcessStats, MemoryReport - Add get_memory_stats() to CompressionStore, BatchContextStore, GraphStore, HNSWVectorIndex - Add /debug/memory API endpoint for runtime monitoring Components tracked: - compression_store: CCR compressed tool outputs - batch_context_store: Batch API request contexts - graph_store: Knowledge graph entities and relationships - vector_index: HNSW vector embeddings - semantic_cache: Response cache - request_logger: Request metadata Includes 47 tests (unit + integration) with real API calls.

…ervability Add memory observability system (Phase 1)

Replace unbounded InMemoryGraphStore with SQLite-backed implementation: - Persistent storage survives proxy restarts - Memory bounded by configurable SQLite page cache (default 8MB) - Same async interface as InMemoryGraphStore (drop-in replacement) - LocalBackend now uses SQLiteGraphStore by default (graph_persist=True) New files: - headroom/memory/adapters/sqlite_graph.py: SQLite graph store implementation - tests/test_sqlite_graph_store.py: 37 comprehensive tests Key features: - O(log n) lookups via database indexes - Case-insensitive entity name lookup per user - BFS subgraph traversal and shortest path finding - CASCADE delete for entity relationships - MemoryTracker integration via get_memory_stats()

…ervability Add SQLiteGraphStore for bounded, persistent graph storage

…d tool Bedrock requires role=tool messages immediately after assistant tool_calls. The previous fix inserted a user text message in between when the message contained both text and tool_result blocks, breaking the pairing. Drop text alongside tool_result (Claude Code never sends it in practice). Added ordering regression tests for the Bedrock constraint.

The /v1/responses handler was passing through without compression, meaning Codex CLI users got zero savings. Now converts Responses API items (function_call, function_call_output, reasoning, message) to Chat Completions format, runs the existing pipeline, and converts back. - New: headroom/proxy/responses_converter.py — pure conversion functions - 21 unit tests + 3 integration tests (tested with real OpenAI API) - Preserves reasoning items, images, unknown types verbatim - Skips compression when previous_response_id is set - 27% compression on real Codex-pattern payloads (500 records → 14K tokens saved) Closes chopratejas#73

Private scripts with credentials should not be tracked in git.

Codex v0.117.0+ with newer models uses WebSocket instead of HTTP POST for the Responses API. Added @app.websocket("/v1/responses") handler that: - Accepts ws:// connections and forwards to wss://api.openai.com - Compresses input on first message using existing pipeline - Relays all response events bidirectionally - Handles SSL (certifi), graceful disconnect, missing websockets lib Tested with real OpenAI API: basic text, large tool outputs (200 records), parallel function calls, instructions preservation. Addresses chopratejas#79

KompressCompressor now tries ONNX Runtime first (156MB INT8 model), falls back to PyTorch only if ONNX unavailable. No torch needed for text compression — just onnxruntime (~50MB) + transformers (tokenizer). Changes: - Add onnxruntime + transformers to [proxy] extra in pyproject.toml - Add _OnnxModel wrapper with get_scores/get_keep_mask interface - _load_kompress() tries ONNX first, falls back to PyTorch - is_kompress_available() returns True if EITHER backend available - compress() handles both numpy (ONNX) and tensor (PyTorch) outputs Dependency impact: Before: pip install headroom-ai[proxy] → no text compression After: pip install headroom-ai[proxy] → Kompress ONNX INT8 (156MB) [ml] extra still available for full PyTorch (600MB, GPU support)

- WebSocket proxy for /v1/responses (Codex gpt-5.4+ support) - Kompress ONNX INT8 text compression (no torch needed, ~100MB vs 1.5GB) - Updated Discord invite link - Tool_result ordering fix for Bedrock

…history

…thropic-api-url flag

OpenAI's WebSocket Responses API requires the header 'OpenAI-Beta: responses-api=v1'. Without it, the server returns HTTP 500 on the WebSocket upgrade. Also forward all client headers (not just auth) to upstream, skipping only hop-by-hop headers. Tested with real OpenAI API: basic text, large tool output compression, all working through WebSocket proxy. Fixes chopratejas#82, updates chopratejas#79

…history feat: persist proxy savings history

savings_usd is now tokens_saved * model list input price (monotonic, transparent). Removed non-monotonic moving-average repricing and confusing cost_without_headroom counterfactual. Dashboard hero shows "Compression Savings" with clear subtitle. Savings Breakdown section shows compression, cache, and RTK separately with distinct colors and no scope mixing. All beacon/telemetry fields preserved. RTK token counts still reported. Fixes chopratejas#83

…0.5.17 - Fix beacon spam: file lock ensures only one beacon per proxy regardless of worker count. Workers > 1 caused N beacons firing N rows per cycle. - Beacon upsert: on_conflict=session_id prevents duplicate rows. - Beacon stop() guard: skip final report if uptime < 2 minutes. - Fix dashboard cost: savings_usd now uses model list price (monotonic), not moving average. Separate breakdown for compression/cache/rtk. Fixes chopratejas#83

feat: Add Anthropic url overrides via env vars and cli flags

transforms_summary is a counted dict (e.g. {"router:tool_result:text": 4}) alongside the raw transforms_applied list. Cleaner display for users without losing the raw data for debugging.

Move ProxyConfig, RequestLog, CacheEntry, RateLimitState to headroom/proxy/models.py. Re-exported from server.py for backward compatibility — all existing imports continue to work. server.py: 8835 → 8643 lines (-192) models.py: 199 lines (new) Part of the server.py split effort to improve maintainability.

…nd forwarding - Fix WS /v1/responses: forward Sec-WebSocket-Protocol (subprotocol) to upstream instead of stripping it — root cause of Codex HTTP 500 errors - Fix WS relay: handle binary messages properly instead of crashing on .decode(), add debug logging instead of silent except:pass - Add Authorization header fallback from OPENAI_API_KEY env var for WS - Extract response body from websockets InvalidStatus for error debugging - Fix streaming /v1/responses: pass optimized_tokens (not original_tokens twice) so compression savings appear in streaming metrics - Fix hardcoded provider="bedrock" in 4 metrics/log locations — now uses self.anthropic_backend.name so LiteLLM backends report correctly - Forward --backend, --anyllm-provider, --region flags from wrap commands (codex, aider) to the proxy subprocess via _start_proxy() - Forward API key from request headers to LiteLLM acompletion() calls - Forward region to Vertex AI (vertex_location) not just Bedrock - Redesign proxy startup banner: show routing table instead of misleading "Backend: Anthropic" label Closes chopratejas#86 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CacheAligner was hardcoded to enabled=True in the pipeline despite the config default being False. It extracts dynamic content from the system prompt middle and reinserts at the end, which: 1. CHANGES the prefix bytes → provider cache miss (loses 90% discount) 2. ADDS ~341 tokens of formatting overhead per request 3. Net effect: more expensive, worse caching Now uses the config default (enabled=False). The CacheAligner still exists for users who explicitly opt in, but the proxy no longer forces it on.

Manual ssl.create_default_context() + certifi doesn't load the Windows system certificate store, causing HTTP 500 on wss:// connections to OpenAI. Using ssl=True lets the websockets library handle SSL natively with proper cross-platform cert store loading.

Multi-stage build with uv, non-root user, and layer-optimized caching. - Multi-stage: build deps (gcc/g++) stay in builder, runtime is clean slim - uv instead of pip: uses existing uv.lock for deterministic fast installs - Layer caching: deps cached independently from source (rebuild 37s -> 4s) - Non-root: runs as headroom:1000 instead of root - Image size: 1.11GB -> 514MB (-54%) - Add CI workflow for multi-arch (amd64+arm64) GHCR publishing - Expand .dockerignore to exclude JS artifacts, IDE files, Docker files Closes chopratejas#89

chopratejas · 2026-04-02T19:18:54Z

Thanks for the changes.

Regd. the UI - we already have a bare bones dashboard in Headroom -

Do you think we should augment that?

pratikbin · 2026-04-02T19:25:51Z

yes, i like little dashboard and so i built one, if you can augment that, it would be wonderfull

mine

ahh i hate git-filter-repo

chopratejas and others added 30 commits January 30, 2026 16:27

Fix asyncio event loop error in Python 3.10+ tests

7ca629b

Use asyncio.run() instead of asyncio.get_event_loop().run_until_complete() which raises RuntimeError in Python 3.10+ when no event loop exists.

Add global httpx.ReadTimeout handler for all tests

b97c5cc

Move the network timeout skip handler to the main tests/conftest.py so it applies to all tests, not just tests/test_memory/*. Fixes flaky CI failures when HuggingFace model downloads timeout.

Add MkDocs GitHub Pages documentation site

13973c0

- Add mkdocs.yml with Material theme (indigo, professional) - Add docs/index.md landing page with quick install - Add GitHub Actions workflow for auto-deployment - Remove old docs/README.md (replaced by index.md)

Add documentation badge to README

b8e500f

Update README with dashboard info and installation recommendations

db5e4de

- Add dashboard URL (http://localhost:8787/dashboard) to quickstart - Recommend headroom-ai[all] for best compression performance - Note that first startup downloads ML models (~500MB one-time)

Merge pull request chopratejas#16 from chopratejas/feature/strands-in…

51c7f44

…tegration Feature/strands integration

Fix mypy errors in html_extraction.py

a0b695f

Rename client/response variables to be unique per provider branch to avoid type inference conflicts. Use getattr for Anthropic content block text access to handle union types. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Merge pull request chopratejas#18 from smartwatermelon/feature/exclud…

095affd

…e-tools-compression feat(compression): add exclude_tools to bypass compression for specific tools

Merge remote-tracking branch 'origin/main'

1a22ce5

Merge pull request chopratejas#19 from prakersh/dev/prakersh/fix-anth…

b68389c

…ropic-tool-unit-detection fix: Handle Anthropic format tool_use/tool_result as atomic units

Merge pull request chopratejas#20 from chopratejas/fix/intelligent-co…

3b68a11

…ntext-anthropic-format fix: Extend Anthropic format tool protection to IntelligentContextManager

Merge pull request chopratejas#23 from chopratejas/feature/memory-obs…

4487b03

…ervability Add memory observability system (Phase 1)

Merge pull request chopratejas#24 from chopratejas/feature/memory-obs…

c22ffc4

…ervability Add SQLiteGraphStore for bounded, persistent graph storage

chopratejas and others added 25 commits March 30, 2026 16:25

Bump to 0.5.15: Responses API compression, tool_result ordering fix

1e54ccc

Remove dedupe_telemetry.py from repo, add scripts/ to .gitignore

5b3eb08

Private scripts with credentials should not be tracked in git.

Update Discord invite link across all docs

6a30583

Bump to 0.5.16: WebSocket proxy, Kompress ONNX, Discord link update

7b41ddc

- WebSocket proxy for /v1/responses (Codex gpt-5.4+ support) - Kompress ONNX INT8 text compression (no torch needed, ~100MB vs 1.5GB) - Updated Discord invite link - Tool_result ordering fix for Bedrock

feat: add historical rollups and exports

7084c2f

Update test_proxy_savings_history.py

b4e76a0

style: format savings history tests

f0623c8

Merge remote-tracking branch 'origin/main' into feat/persist-savings-…

5d6bf6f

…history

Update dashboard.html

5a3cb5a

Add Anthropic url overrides via ANTHROPIC_TARGET_URL env var and --an…

e3e7eba

…thropic-api-url flag

Merge pull request chopratejas#61 from gglucass/feat/persist-savings-…

5fe92a4

…history feat: persist proxy savings history

Merge pull request chopratejas#81 from dorphalsig/main

e04527f

feat: Add Anthropic url overrides via env vars and cli flags

Add transforms_summary to TransformResult and /v1/compress response

47071cc

transforms_summary is a counted dict (e.g. {"router:tool_result:text": 4}) alongside the raw transforms_applied list. Cleaner display for users without losing the raw data for debugging.

pratikbin force-pushed the feat/production-dockerfile branch from 52550e4 to 653303f Compare April 2, 2026 19:24

Merge branch 'main' into feat/production-dockerfile

f7ddfce

pratikbin closed this Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(docker): production-optimized multi-stage Dockerfile#90

feat(docker): production-optimized multi-stage Dockerfile#90
pratikbin wants to merge 254 commits intochopratejas:mainfrom
pratikbin:feat/production-dockerfile

pratikbin commented Apr 2, 2026

Uh oh!

chopratejas commented Apr 2, 2026

Uh oh!

pratikbin commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Uh oh!

Conversation

pratikbin commented Apr 2, 2026

Summary

Benchmarks (cold build, Apple Silicon / OrbStack)

Test plan

Uh oh!

chopratejas commented Apr 2, 2026

Uh oh!

pratikbin commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

pratikbin commented Apr 2, 2026 •

edited

Loading